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Abstract 

In this paper, we study a controllable tandem queueing system consisting of 
two nodes and a controller, in which customers arrive according to a Poisson 
process and must receive service at both nodes before leaving the system. 
A decision maker dynamically allocates the number of service resource to 
each node facility according to the number of customers in each node. In 
the model, the objective is to minimize the long-run average costs. We 
cast these problems as Markov decision problems by dynamic programming 
approach and derive the monotonicity of the optimal allocation policy and 
the relationship between the two nodes’ optimal policy. Furthermore, we 
get the conditions under which the optimal policy is unique and has the 
bang-bang control policy property. 

Keywords: 

Markov decision problem. Tandem system. Optimal policy. Dynamic 
programming. Average costs 


1. Introduction 

We consider a controllable tandem queueing system consisting of two 
nodes and a controller. A decision maker can assign a number of service re¬ 
source to each node. The study of the controllable tandem queueing system 
is motivated by its wide applications in manufacturing, computer systems, 
voice and data communications, and vehicular traffic ffow. The theory of 
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controllable queueing systems has often been studied for optimal control 
of admission, servicing, dynamic pricing, routing and scheduling of jobs in 
queues or networks of queues. These works are discussed in Stidham and 
Weber (1993), Yang et ah (2011) and Qil et ah (2011). The controllable 
queueing systems based on the theory of Markov, semi-Markov and regener¬ 
ative decision processes can be found in Morozov and Steyaert (2013). Using 
the theory of the queueing system, we often cast the optimal problems as 
Markov decision problems (MDP). In order to get the properties of the op¬ 
timal policy, the properties (such as the monotonicity, convexity property) 
of relative value function (when we consider the long-run average criteria) 
should be first considered. The key of the method is dynamic programming. 
For more details, we can see the paper written by Koole (1998) and Qil et 
al. (2009). 

Based on the application background, the problems of the service re¬ 
source control in different queueing systems have been investigated. Rykov 
and Efrosinin (2004) considered a multi-server controllable queueing system 
with heterogeneous servers, and several monotonicity properties of optimal 
policies are proved. Iravani et al. (2007) studied the optimal service schedul¬ 
ing in nonpreemptive hnite-population queueing systems. The single-queue 
systems of the optimal resource allocation policy were considered by Yang 
et al. (2013). Efrosinin et al. (2014) analyzed a tandem queueing system of 
admission optimal policy. 

Of particular relation to the present work are the works of Rosberg et 
al. (1982) and Ahn et al. (2002) where only the customer’s holding cost 
was considered. Rosberg et al. (1982) considered the optimal control of 
service in tandem queues where the service rate in node 1 can be selected 
from a compact set and constant in node 2. Optimal control of a two-stage 
tandem queues system with flexible servers was discussed in Ahn et al. (2002) 
where only two flexible servers were considered under two different scenarios 
and they obtained the exhaustive optimal policy. Kaufman et al. (2005) 
considered the problem on the agile, temporary workforce into a tandem 
queueing system in which the relationship between the service rate and the 
number of the service resource is linear and the service resource costs in 
different nodes have the same cost function. However, different from the 
previous studies about resource allocation control problem, the two nodes 
in our model have the different holding cost rate and service resource cost 
function in the objective (long-run average cost). The main contribution of 
this paper is that we derive the monotonicity of the optimal allocation policy 
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and the relationship between the two nodes’ optimal policy. Furthermore, we 
get the conditions under which the optimal policy is unique and the bang- 
bang control policy is established. 

The rest of the paper is organized as follows. In Section the model is 
formulated in detail based on the controllable Markov decision problem. The 
characteristics of the optimization problem and the optimality equation are 
derived in SectionIn Section]^ structural properties of the optimal policy 
and main results of the paper are given. Finally, some further discussions 
and conclusions are given in Section 

2. Model Description 

We consider a tandem queueing system with two nodes. Customers ar¬ 
rive at node 1 from outside the system according to a Poisson process with 
parameter A and have exponentially distributed service requirement at each 
node. After receiving service at node 1, customers proceed immediately to 
node 2 and receive service before leaving the system. A decision maker can 
assign a number of service resource to each node. The service rate of a cus¬ 
tomer depends on the number of service resource assigned to the customer 
precisely. When a customer has been allocated a server resources, the ser¬ 
vice duration of that customer in node i is exponentially distributed with 
parameter = 1,2, which is strictly increasing in a. Without loss of 

generality, we assume that /ii(0) = 0,i = 1, 2. At any decision epoch, the de¬ 
cision maker decides to choose the number of server resources to node 1 from 
a compact set A = [0, ttmax], and to node 2 from a compact set B = [0, hmax] 
at the same time. Each node has a single inhnite-size FCFS queue. The 
interarrival and service times are assumed to be mutually independent. We 
assume that the stability condition A < iii{amax)^ A < fi 2 {bmax) holds. Figure 
1 gives an illustration of the system. 

We consider the following cost structure in the system. Our objective is 
to obtain dynamic management policy that minimizes the long-run average 
costs. 

(1) resources cost: when the node i uses a resources, a cost of Ci{a),i = 1,2 
is incurred by the system per unit time (here Cj(a) is a continuous function 
and strictly increasing in a. Without loss of generality, we assume that 
q(0) =0,z = 1,2). 

(2) holding cost: holding costs are incurred at rate hi and ^2 per unit 
time for each customer in node 1 and 2, respectively. 
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Fig. 1 The controllable tandem queueing systems 


Let Xi(t) denote the number of customers at node i,i = 1,2. The system 
evolves as a continuous-time Markov process 

{X{t),t>0} = {{X,{t),X2{t)),t>0}. 

The notations = 1,2, will be used to specify the certain components 

of the vector state x E E. 

The system state space is: E = x = (xi, X2) E iV^, with iV = 0,1, 2,.... 

It is assumed that the model is stable and conservative. The transition 
rate under a control action (a, b) is given by 

y = x^ei, 

y = X - ei + 62, h{x) > 0; 
y = X - e2,l2{x) > 0 ; 
else. 


Qxyiflt b) 



where 


Qxy{a, b)> 0 ,yj^x, Qxxia, b) = -Qx{a, &) = - ^ Qxy{a, b), Qx{a, b) < 00 . 

y^x 

Here is the 2-dimensional vector with 1 in the ith coordinate and 0 else¬ 
where, i = 1, 2. 

The problem of the decision maker is to derive an optimal policy based on 
the number of customers in each node that minimizes the long-run average 
costs. We cast the customer resource management problem as a Markov 
decision problem. The set of decision epochs corresponds to the set of all 
arrivals, service completions, and dummy transitions due to uniformization. 
The controllable system associated with a Markov process is a five-tuple 
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{E, D = {A, B), Q{f), Ci{a), hi}, i = 1,2, 
in which Q{f) is the transition matrix of the queueing system under the 
policy /. 

We consider the stationary Markov policy f : E ^ D with / = (/i,/ 2 ). 
Due to the Markov property, it is clear that the optimal policy depends only 
on the current state regardless of t. More precisely, when the system state is 
X = {xi,X2), the controller makes an action fi{xi) = a G A, f2{x2) = b G B. 
The action of the service resource to node i only depends on the current 
number of customers in node i. 

3. Optimization problem and optimality equation 

For every fixed stationary policy /, we assume that the process {X(t),t > 
0} with state space E is an irreducible, positive recurrent Markov process. As 
it is known from Tijms (1994), for ergodic Markov process with the long-run 
average cost per unit of time for the policy / coincides with corresponding 
assemble average, 

gif) = lim u{x,t)^/t = ^^[ci(/i(i))+C 2 (/ 2 (i)) + ^D + ^2j]7rij(/), (1) 

i=l j=l 

in which uix,t)^ denotes the total expected costs up to time t when the 
system starts in state x and vrjj(/) denotes a stationary probability of the 
process under policy /. The goal is to find a policy f* that minimizes the 
long-term average costs: 

f/(r) = mmc/(/). (2) 

In order to find the optimal policy f* that minimizes the total average cost, 
we construct a discrete-time equivalent of the original system by using the 
standard tools of uniformization and normalization. Without loss of gen¬ 
erality, we assume that A -|- Hiittmax) + h 2 ibmax) = 1- Now we consider a 
real-valued function v{x) that plays the role of the relative value function, 
i.e., the asymptotic difference in total costs that results from starting the 
process in state x instead of some reference state. As it is well known, the 
optimal policy / and the optimal average cost g are the solutions of the 
optimality equation 

Tvix) = vix) + g. 
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where T is the dynamic programming operator acting on n, dehned as follows 


Tv{x) = \v{x + Cl) + T,i=i^ 2 Tiv{x) + T,i=i^ 2 hili{x), (3) 

here 

Tiv{x) = mm{fj,i{a)v{x - ei + 62 ) + [fJ^i{amax) - Hi{a)]v{x) + ci(a)}, (4) 

aGA 

T2v{x) = mm{lJ,2{b)v{x - 63) + [H 2 {bmax) - ^2{b)]v{x) + C2(6)}. ( 5 ) 

b&B 

The hrst term in the expression Tv{x) models the arrivals of customers to 
node 1 from outside the system and the last one the customer holding cost. 
Similarly the hrst term in the expression Tiv{x) corresponds to a customer 
who hnished his service in node 1 and into node 2 and the second one the 
uniformization constant. The last one in Tiv{x) is the resources cost in node 
1. The hrst term in the expression T2v{x) corresponds to a customer who 
hnished his service in node 2 and the second one the uniformization constant. 
The last one in T2v{x) is the resources cost in node 2. 

According to (1), we can solve another optimization problem; if q = 
0,hi = l,i = 1,2, then (2) is equivalent to minimization of the mean number 
of customers in the queueing system. 

4. Structural properties of the optimal policy 

In this section, we focus on deriving the optimal policy. However, the 
optimal policy possesses structural properties that provide fundamental in¬ 
sight, and this also enables one to determine the optimal policy with less 
computational ehort due to a reduction of the solution search space. 

In order to study the structure, in principle, one needs to solve the optimal 
equation Tv{x) = v{x) + g. However it is hard to solve analytically in 
practice. It can be obtained by recursively dehning Vn+i = Tvn for arbitrary 
uo- We know that the actions converge to the optimal policy as n —)■ 00 . 
For existence and convergence of the solutions and optimal policy we refer 
to Aviv and Federgruen (1999) and Sennott (2009). The backward recursion 
equation is given by 

Vn+l{x) = \Vn{x + Cl) ^ TiVn{x) ^ hili{x). 

i=l,2 i=l,2 
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For ease of notation, we define the set of the optimal policy in state x by: 
f{x) = {fi{xi)j 2 {x 2 )) fi{xi) = argTiv{x) f 2 {x 2 ) = argT 2 v{x). 


By using the optimality equation, we can get the properties of relative 
value function as follows: 

Property 4.1 (non-decreasingness) 

(i) v{x + Ci) > v{x),i = 1, 2 for all x E E, 

(ii) if 2/i2 > hi then v{x — ei + 62) > v{x — 62) for all x = (xi,X2) E E 
and Xi > 1, a ;2 > 1, 


(hi) if hi > /i 2 then v{x) > v{x — Ci + 62 ) for all x = {xi,X 2 ) E E and 
xi > 1, a;2 > 1. 

Property 4.2 (quasi-convexity) 

(i) v{x -h 62 ) — 2 v{x) + v{x — 62 ) > 0, for all x = (xi, X 2 ) E E and X 2 > 1, 

(ii) v{x + Cl — €2) — 2 v{x) + v{x — Ci -t- 62) > 0, for all x = {xi,X2) E E 
and xi > 1 , a ;2 > 1 . 

Next we show some structure properties of the optimal policy, based on 
the structure properties of the relative value function above. 

Theorem 1 . The optimal policy has the monotonicity property, i.e., 

(i) if hi E argT2v{x + 62),&2 ^ argT2v{x), then hi > 62 for all x = 
{xi,X 2 ) E E. 

(ii) if ai E argTiv{x + ei),a2 G argTiv{x), then ai > 02 for all x = 
{xi,X 2 ) E E. 


The proof of Property 4.1 is given in Appendix A. The proof of Property 
4.2 and Theorem 1 are given in Appendix B. 

Based on Property 4.1, we give the relationship between the two nodes’ 
optimal policy under some conditions. 

Theorem 2. Assume that Ci(a) — Ci{b) > 02(0) — 02(6) and fj. 2 (ci) — fi 2 {b) > 
fj,i{a) — fii{b) when a > b. Then if a E argTiv{x),h E argT 2 v{x), we have 
b > a for all x = {xi, X2) E E and xi>l,X2>l. 
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Proof. Let (a G argTiv{x),h G argT 2 v{x)) be an arbitrary optimal policy 
for node 1 and 2 in state x, respectively. The proof is by contradiction. 
Suppose that b < a, then we compare the policy (a, b) with the policy (6, a). 
We have: 

= [iii{a)v{x - Cl + 62 ) + [iii{a„,ax) - g,i{a)]v{x) + ci(a)] 

+ [g, 2 {b)v{x - €2) + [il2{bniax) “ ii 2 {b)]v{x) + 02(6)] 

-[g,i{b)v{x - Cl + 62) + [iJ^i{bmax) - g.i{b)]v{x) + ci{b)] 

-[g,2{a)v{x - €2) + [/i2(amax) - Id2{a)]v{x) + C2(a)] 

= [/ii(a) - g,i{b)][v{x - ei + 62 ) - n(a;)] - [g, 2 {o) - g, 2 {b)][v{x - 62 ) - n(a;)] 
+ci(a) - ci(b) - C 2 (a) + 02 ( 6 ) 

> [ni{a) - fii{b)][v{x - Cl + 62 ) - v{x - 62 )] + Cl (a) - Ci( 6 ) - 02 ( 0 ) + 02 ( 6 ) 

> 0. 

The first equality is based on the dehnition of the operators Ti and T 2 . The 
second equality follows by rearranging the terms. The first inequality follows 
the condition fi 2 {a) — fi 2 {b) > /ii(a) — fii{b) when a > b. This implies that 
a and b is not an optimal policy for node 1 and 2 in state x, respectively. 
Hence, b > a. 

From the above theorem we can conclude that under some conditions 
the optimal size of the service resources allocate to node 1 is less than that 
to node 2. We hnd that the optimal size of the resource allocate to each 
node depends on the resource cost variation c(a) — c( 6 ) and the service rate 
variation /i(a) — /i( 6 ) in each node. 

We are now ready to give some conditions under which the optimal policy 
is unique and is a bang-bang control policy. 

Theorem 3. The following properties hold 

(i) if the functions mi (a) = ^ 7 ^ and m 2 (6) = are monotonous on 
a E A,b E B, then the optimal policy is unique. 

(a) argTiv{0) = { 0 },argT 2 v{ 0 ) = {0}. 

(Hi) if the functions and are non-increasing, and 

iliy > for all a E {0,amax),b E {0,bmax), then the optimal policy is a 
bang-bang control policy, i.e., argTiv{x) = { 0 ,amax},argT 2 v{x) = {0,bmax} 
for all X E E. 
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Proof. To prove part (i), we consider the optimal policy a in node 1 service 
resource allocation. In our event operator Ti for node 1 defined in equation 
(3), we have the following minmization problem: 


Tiv{x) = - Ci + 62) + [fJ^i{amax) - + Ci(a)}. 

aGA 


Rearranging the hrst-order optimality condition of the above problem, we 
have: 



p;(a) 


v{x) — v{x — Cl + 62). 


Because the allocation resource action a G A = [0, amax], the optimal policy a 
must be the solution of the above equation. Since the function mi (a) = ^ 7 ^ 
is monotonous on a G A, there is a unique a solving the above equation. 
Hence the optimal policy for node 1 is unique. The part (i) for node 2 can 
be proved in a similar manner. 


To prove part (ii), we consider the optimal policy a in node 1 service 
resource allocation. As the problem is dehned in equation (3), we have 

Tiu(O) = min{/ii(a)u(0) + [fj,i{amax) - /ii(a)]u(0) + ci(a)}, 

aGA 

which immediately implies that argTiv{0) = {0}. The part (ii) for node 2 
that argT 2 v{ 0 ) = { 0 } can be proved in a similar manner. 

To prove part (hi), we consider the optimal policy a in node 1 service 
resource allocation. Since the service resources in node 1 is from the compact 
set [ 0 , ttmax], the optimal policy a in node 1 can be 0 , or ttmax, or satishes the 
following equation: 


c'lja) 


v{x) — v{x — Cl + 62). 


We use the contradiction method. Assume that a G argTiv{x) such that 
a G (0, ttmax) for all x E E. For any e > 0, we have: 


T^+^v{x) - T^ix) 

= [pi(a + e) - g,i{a)][v{x - ei + 62 ) - u(a;)] + ci(a + e) - ci(a) > 0 , 


9 



which implies that 


v{x) 


v{x 


Cl + 62) < 


Cl (a + e) — Cl (a) 

/il(o “1“ 


Since the function is non-increasing, we get <; £iM 

n(a;) — n(a; — Cl -h 62 ) < which is a contradiction with the condition 

^ £iM_ gQ there is no a satisfying the above equation. That is, the 
optimal policy in node 1 is argTiv{x) = {0,amaa;}- Thus, the optimal policy 
is a bang-bang control policy. The part (iii) for node 2 can be proved in a 
similar manner. 


5 . Conclusion 

In this paper we have analysed the optimal server resources control of a 
tandem queueing system with two nodes. The controller can make a dynamic 
decision to allocate the service resource to each node at any decision epoch. 
Applying the dynamic programming to the model, we not only give some 
traditional properties of the relative value function and optimal policy, but 
also derive the condition under which the optimal policy is unique and bang- 
bang control occurs. In particular, we have provided the relationship between 
the two nodes’ optimal policy, which can give the controller more information 
to manage the system. 

From the above results there arise some interesting extensions of the 
model which we may study in the near future. 

(i) One possible change is to consider a model where each node’s ser¬ 
vice resource decision is dependent on the number of the customers in two 
queues. When the system state is a; = {xi,X 2 ), the controller makes an action 
fi(xi,X 2 ) = a E A, f 2 (xi,X 2 ) = b E B. Although the analysis is difficult, we 
may get some another properties of the queue optimal policy. In our model 
the two nodes have their action sets. We can also study the further model 
in which the two nodes share the common server resources. 

(ii) Another way to generalize the model is to consider some strategies in 
our model, such as the retrial, feedback and priority customers. The model 
may become more complex. Some other methods should be considered. In 
our model the customers arrive at the system according to a Poisson process 
and the service time of a customer is exponentially distributed. We can apply 
the embedded Markov chain and semi-Markov decision processes to consider 
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the queueing system in which the service time of a customer is a general 
distribution. 

(iii) In addition, the tandem queueing system with n nodes is also wor¬ 
thy thinking about. Based on our model, we can study the optimal policy 
relationship between the two nodes. 

Appendix A 

Property 4.1 (non-decreasingness) 

Proof. To prove Property 4.1 (i), the proof is done by induction on n in 
Vn- Dehne Vq{x) = 0 for all state x E E. This function obviously satishes 
(i). Now, we assume that (i) holds for the function Vn{x),x G E and some 
n E N. We should prove that Vn+i{x) satishes the non-decreasing property 
as well. Then for i = 1, we can get 

Vn+l{x + ei) -Vn+l{x) 

= \[Vn{x + 2ei) - Vn{x + Cl)] + ^ TiVn{x + Cl) - ^ TiVn{x). 

i=l,2 i=l,2 

The second term of the right-hand side is obviously positive. 

Let (a G argTiv{x),h G argT2Vn{x)) be an arbitrary optimal policy for 
node 1 and 2 in state x, respectively. Then 

TiVn{x + Cl) - TiVn{x) 
i=l,2 i=l,2 

> Hi{a)[Vn{x + 62) - Vnix + 62- Ci)] 

+fJ^2{b)[Vn{x -62 + Cl) - Vn{x - 62)] 

+ [Hi{a 

max ) — /ii(a) -|- g,2{h 

max ) - +2{b)][Vn{x -f Cl) - Vn{x)] 

>0, 

Therefore, Property 4.1 (i) holds by induction for any n, v{x) is a nonde¬ 
creasing function. Property 4.1 (i) for i = 2 can be proved in a similar 
manner. 

To prove Property 4.1 (ii), the proof is similar to the above one. Dehne 
no(a;) = 0 for all state x E E. This function obviously satishes the (ii). Now, 
we assume that (ii) holds for function Vn{x), x E E and some n E N. We 
should prove that Vn+i{x) satishes Property 4.1 (ii) as well. 

Vn+l{x -61 + 62) - Vn+l{x - 62) 
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= \[Vn{x + 62) - Vn{x + Ci - 62)] + 2/^2 - h 

+ ^ TiVn{x - ei + 62 ) - ^ TiVn{x - 62 ). 
i=l ,2 i=l,2 

Since the condition 2 /i 2 > hi holds, the second term of the right-hand side is 
obviously positive. 

Let (a G argTiv{x — 62),b G argT2v{x — 62)) be an arbitrary optimal 
policy for node 1 and 2 in state x — 62, respectively. Then 

TiVn{x -61+62)-'^ TiVn{x - 62) 
i=l,2 i=l,2 

> fii{a)[Vn{x - 2 ei -h 262) - Vnix - 62)] 

+fi2{b)[Vn{x - ei) - Vn{x - 262 )] 

+ [hl(amax) - IJ^l{a)][Vn{x - Ci H- 62 ) - Vn{x - 63 )] 

+ [^2{bmax) - ^2{b)][Vn{x - 61 -f- 62) - Vn{x - 62)] 

> 0. 

Therefore, Propertyd.l (ii) holds by induction for any n, we have v{x — ei + 
62) > v{x — 62) for all X = (xi, X2) G E and Xi > l,X2 > 1 . Property 4.1 (hi) 
can be proved in a similar manner. 

Appendix B 

Property 4.2 (quasi-convexity) (i) and Theorem 1 (i) 

Proof. To prove Property 4.2 (i), we assume that Property 4.2 (i) for func¬ 
tion Vn{x), X G E and some n & N holds. Then we need to prove that 
Property 4.2 (i) for n -|- 1 also holds. When x = {xi,X2) G E and X2 > 1 , we 
have 


Vn+l{x + 62 ) - 2 Vn+l{x) + Vn+l{x - 62 ) 

= \[Vn{x + 62 + 61 ) - 2Vn{x 4 - 61 ) + Vn{x + 61 - 62 )] 

-4 TiVn(x + 62) -2'^ TiVn(x) + TiVn(x - 62 ) 
i=l ,2 i=l ,2 i=l ,2 

> TiVn(x + 62) -2'^ TiVn(x) + TiVn(x - 62 ). 
i=l ,2 i=l ,2 i=l ,2 

The inequality holds by the induction hypothesis. The optimal policy 
of node 1 is only dependent on the number of customers in node 1 and the 
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state X + 621 X — 62 have the same first entry Xi. Hence, they have the 
same optimal policy in node 1 . We assume that a G argTiv{x + 62), G 
argT2v{x + 62), a G argTiv{x — 62), &2 ^ argT2v{x — 62). Therefore, we get 

TiVn{x + 62) - 2 TiVn{x) + TiVn{x - 62) 
i=l ,2 i=l ,2 i=l ,2 

> /Ui(a)lVn(x - Cl + 262) - 2 Vn(x - Ci + 62) + Vn(x - Ci)] 

+ lMl(amax) - Ml(a)]lVn(x + 62) - 2 Vn(x) + Vn(x - 62)] 

+ lM2(bl) - M2(b2)]lVn(x) - Vn(x - 62)] 

+/U2(b2)lVn(x) - 2Vn(x - 62) + Vn{x - 262)] 

+ [/i2(& max ) - fJ^ 2 {bi)][Vn{x + 62) - Vn{x)] 

+ [fJ^2{bmax) - ^2{b2)][Vn{x - 62) - Vn{x)] 

= fil{a)[Vn{x - Cl + 262) - 2 Vn{x - Cl + 62) + Vn{x - Cl)] 

+ [fJ^l{amax) - fil{a)][Vn{x + 62) - 2Vn{x) + Vn{x - 62)] 
+^2{b2)[Vn{x) - 2Vn{x - 62) + Vnix - 262)] 

+ [fJ^2{bmax) - ft2{bi)][Vn{x + 62 ) - 2Vn{x) + Vn{x - 62)] 

> 0 . 

The hrst inequality follows by taking a potentially suboptimal action in 
the second term of TiVn{x + e2)-2Y,i=i^2 TiVn{x) + Y,i=i^2 TiVn{x-e2). 

The equality follows by rearranging the terms. The last inequality follows by 
the induction hypothesis. Hence, we have v{x + 62) — 2 v{x) + v{x — €2) > 0 . 

For Theorem 1 (i), let {pi G argT2v{x + e2), &2 G argT2v{x)) be an optimal 
policy for node 2 in states a; + 62, x, respectively. The proof is done by 
contradiction. Suppose that 61 < 62, then 

T^^v{x) - T^^v{x) 

= [Ai2(&2) - fi2{bi)][v{x) - v{x - 62)] - [02(62) - 02(61)] > 0. 

Since Property 4.1 (i) above and g,2{b2) — ^^2(61) > 0 holds, we have 
T^pvi^x + 02 ) - T 2 ^v{x + 62 ) 

= [1^2(62) - M2(bi)]lv(x + 62) -v(x)] - [02(62) -02(61)] 

> lM2(b2) - M2(bi)]lv(x) - v(x - 62)] - [02(62) - 02(61)] 

> 0. 

However, this implies that 61 is not an optimal policy for node 2 in state 
X + 02. Hence 61 > 62 . 
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Property 4.2(quasi-convexity) (ii) and Theorem 1 (ii) 

To prove Property 4.2 (ii), we assume that Property 4.2 (ii) holds for 
function Vn{x), x E E and some n E N. Then we need to prove that Property 
4.2 (ii) for n -|- 1 also holds. When x = {xi,X 2 ) E E and Xi > l,a ;2 > 1, we 
have 


Vn+l{x + Cl - 62 ) - 2 Vn+l{x) -|- Vn+l{x - Cl + 62 ) 

= \[Vn{x + 2 ei - 62) - 2 Vn{x + Cl) -f- Vn{x + 62)] 

^ TiVn{x + ei- 62) -2'^ TiVn{x) + ^ TiVn{x - Cl - 1 - 62) 
i=l,2 i=l,2 i=l,2 

> ^ TiVn{x -f Cl - 62) - 2 ^ TiVn{x) + ^ TiVn{x - Cl + 62) 

i=l,2 i=l,2 i=l,2 

= TiVn{x -f Cl - 62) - 2 TiVn{x) TiVn{x - Ci -f 62) 

+T2Vnix -f Cl - 62) - 2T2Vnix) + T2Vn{x - Cl + 62). 

The inequality above holds by the induction hypothesis. Now, we assume 
that ai G argTiv{x + ei — 62 ), hi E argT 2 v{x + ei — 62 ), 02 G argTiv{x — ei + 
62 ), 62 e argT 2 v{x — 61 + 62 ). Then, we get 

TlVn{x -f ei - 62) - 2 TiVn(x) + TiVn{x -61 + 62) 

> IJ,i{ai)[Vn{x) - Vn{x - 6i + 62)] 

+fil{a2)[Vn{x - 2 ei -h 262) - Vnix -61 + 62)] 

+ 1/2,1 (a 

max ) - / 2 l{ai)][Vn{x -f Cl - 62) - Vn{x)] 

+ [/ii{a 

max ) - /2l{a2)][Vn{x - Cl -f 62) - Vn{x)] 

= f2l{a2)[Vn{x - 2 ei -h 262) - 2 Vn{x -61 + 62) + Vn{x)] 

+ [/i,l{amax) - / 2 l{ai)][Vn{x + 6i - 62) - 2 Vn{x) -f Vn{x -61 + 62)] 

> 0. 

The hrst inequality follows by taking a potentially suboptimal action in 
the second term of the operator Tin„(a;-|-ei — 62 )— 2 Tinjj(a;)-|-Tin„(a; — 61 - 1 - 62 ). 
The equality follows by rearranging the terms. The last inequality follows by 
the induction hypothesis. 


T2Vn(x + 61- 62) - 2T2Vn(x) + T2Vn(x - 6i + 62) 
> /22(bl)lVn(x + 61 - 262 ) - Vn(x - 62 )] 
+/U2(h2)lVn(x - 61 ) - Vn(x - 62 )] 
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+ [^2{b max ) - l^2{bl)][Vn{x + ei + 62) - Vn{x)] 

+ [i^2{bniax) - ^l2{b2)][Vn{x - Ci + 62 ) - Vn{x)] 

= fJ^2{b2)[Vn{x + ei - 262) - 2 Vn{x - 62) + Vn{x - Cl)] 

+ [fJ^2{bmax) - IJ^2{b2)][Vn{x + Ci - 62 ) - 2Vn{x) + Vn{x - Ci + 62 )] 

+ [l^ 2 {bl) - IJ^ 2 {b 2 )][Vn{x + ei - 262) - Vnix + Ci - 62)] 

> 0. 

The first inequality follows by taking a potentially suboptimal action in 
the second term of the operator above. The equality follows by rearranging 
the terms. The last one follows by the induction hypothesis and because of 
Theorem 1 (i), we know that < 62 - So that we have /i 2 (^i) — ^ 2 (^ 2 ) < 0. 
From the Property 4.1, we know that Vnix + Ci — 262 ) — Vn{x + Ci — 62 ) < 0. 
Thus, we derive that [fJ, 2 {bi) — fj, 2 {b 2 )][vn{x + ei — 262 ) — Vn{x + ei — 62 )] > 0. 
Therefore, the last inequality is taken. 

For Theorem 1 (ii), let (oi G argTiv{x + Ci — 62 ), 02 ^ argTiv{x)) be an 
optimal policy for node 2 in states x + ei — 62 , x, respectively. The proof is 
done by contradiction. Suppose that Oi < 02 , then 

T^^v{x) - T^^v{x) 

= [ni{a2) - /ii(ai)][n(a; - Ci + 62) - n(a;)] - [01(02) - Ci(ai)] 

> 0. 

From Property 4.1 (ii) above and fj,i{a 2 ) — 1 ^ 1 ( 01 ) > 0, we have 

T^^v{x + Cl - 62 ) - T^^v{x + Cl - 62 ) 

= [hi(« 2 ) - ^i{ai)][v{x) -v{x + ei - 62 )] - [ 01 ( 02 ) -Oi(oi)] 

> [hi(« 2 ) - fii{ai)][v{x - oi + 02 ) - n(a;)] - [ 01 ( 02 ) - oi(oi)] 

> 0. 

However, this implies that oi is not an optimal policy for node 1 in state 
a; + oi — 02 . Hence oi > 02 . 

Since the optimal policy of node 1 is dependent only on the number 
of customers in node 1 , and the states a; + Oi, a; + Oi — 02 have the same 
hrst entry a;i + 1. So they have the same optimal policy Oi in node 1, i.e., 
oi G argTiv{x + ei). Thus we get that if oi G argTiv{x + ei),a 2 G argTiv{x) 
hold, then we have oi > 02 for all x = (a;i, a; 2 ) G E. 
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